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Abstract: We study statistical detection of grayscale objects in noisy im- 
ages. The object of interest is of unknown shape and has an unknown inten- 
sity, that can be varying over the object and can be negative. No boundary 
shape constraints are imposed on the object, only a weak bulk condition for 
the object's interior is required. We propose an algorithm that can be used 
to detect grayscale objects of unknown shapes in the presence of nonpara- 
metric noise of unknown level. Our algorithm is based on a nonparametric 
multiple testing procedure. 

We establish the limit of applicability of our method via an explicit, 
closed-form, non-asymptotic and nonparametric consistency bound. This 
bound is valid for a wide class of nonparametric noise distributions. We 
achieve this by proving an uncertainty principle for percolation on finite 
lattices. 
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1. Introduction 

Object detection and image reconstruction for noisy images are two of the cor- 
nerstone problems in image analysis. In this paper, we continue our work on an 
efficient method for quick detection of objects in noisy images. Our approach 
uses mathematical percolation theory. 

Detection of objects in noisy images is the most basic problem of image analy- 
sis. Indeed, when one looks at a noisy image, the first question to ask is whether 
there is any object at all. This is also a prim ary question of interest in such 



diverse fields as, for exa mple, cancer detec tion ([Ricci-Vitiani et al.1 (|2007|)), au- 



tomated urban analysis (jNegri et al.l (|2006f )V detection of cracks in buried pipes 
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( Sinha and Fieeuthl l|2006ft l. and other possible applications in astronomy, elec- 
tron microscopy and neurology. Moreover, if there is just a random noise in the 
picture, it doesn't make sense to run computationally intensive procedures for 
image reconstruction for this particular picture. Surprisingly, the vast majority 
of image analysis methods, both in statistics and in engineering, skip this stage 
and start immediately with image reconstruction. 

The crucial difference of our method is that we do not impose any shape 
or smoothness assumptions on the boundary of the object. This permits the 
detection of nonsmooth, irregular or disconnected objects in noisy images, under 
very mild assumptions on the object's interior. This is especially suitable, for 
example, if one has to detect a highly irregular non-convex object in a noisy 
image. This is usually the case, for example, in the aforementioned fields of 
automated urban analysis, cancer detection and detection of cracks in materials. 
Although our detection procedure works for regular images as well, it is precisely 
the class of irregular images with unknown shape where our method can be very 
advantageous. 

We approached the object detection problem as a hypothesis testing problem 
within the class of statistical inverse problems in spatial statistics. We were able 
to extend ou r appr oach to the nonparametric case of unk nown noise density in 



Davies et al 



(j2009al ) and 
continuous 



(2009 ) and Langovoy and Wittichl ( 2010a ). In Langovoy and Wittich 



Davies et al.l ([20091 ) . this density was not assumed smooth or even 



It is even possible that the noise dist ributi on is heavy-tailed, see 
angovov and Wittichl (|2009al) . iDavies et al.l ([2009) and lLangovov and Wittichl 



( 20i0aD. 



In lLangovov and Wittic hi (l2010bh . we gave an algorithmic implementation of 
our nonparametric hypothesis testing procedure. We also provided a program 
that can be used for statistical experiments in image processing. This program 
was written in the statistical programming language R. 

We have shown that there is a deep connection between the spatial structure 
chosen for the discretisation of the image, the type of the noise distribution on 
the image, and statistical properties of object detection. These results seem to 
be of independent interest for the field of spatial statistics. 

In ou r pre vious papers, we considered the case of square lattices in Langovoy and Wittichl 

in lDavies et al.l 



2009a|) an d lLangovov and Wittich (2009b), triangular lattices 



2009 ) an d lLangovov and Wittichl (l2010al) and even the case of general periodic 



lattices in lLangovov and Wittichl ( 2010al) . In all those cases, we proved that our 
detection algorithms have linear complexity in terms of the number of pixels 
on the screen. These procedures are not only asymptotically consistent, but on 
top of that they have accuracy that grows exponentially with the "number of 
pixels" in the object of detection. All of our detection algorithms have a built-in 
data-driven stopping rule, so there is no need in human assistance to stop the 
algorithm at an appropriate step. 

In view of the above, our method can be considered as an unsupervised 
learning method, in the language of machine learning. This makes our results 
valuable for the field of machine learning as well. Indeed, we do not only propose 
an unsupervised method, but also prove the method's consistency and even go 
as far as to prove the rates of convergence. 

In our previous papers we assumed that the original image was black-and- 
white and that the noisy image was grayscale. In the present paper, we consider 
the general case where the signal intensity is completely unknown. This intensity 
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is only assumed to be bounded, but otherwise can vary from pixel to pixel and 
can be negative. 

We propose a multiple testing procedure for detection of grayscale objects of 
unknown varying intensity in grayscale pictures corrupted by a nonparametric 
noise that has an unknown distribution. Instead of using a single fixed thresh- 
old, we choose a set of thresho lds and perform the maximum cluster test from 
Laneovov and Wittichl (|2010ah for each of those thresholds. We show in this 
paper that, under mild model assumptions, if there is an object in the picture, 
then it is possible to choose a set of thresholds such that we will consistently 
detect this object, whenever the object can be even in principle detected on 
the basis of sizes of percolation clusters. This is one of the two parts that are 
necessary to prove consistency of the new test. 

To establish this result, we need to find out when a signal is too weak so 
that it cannot be detected by our approach. We achieve this goal by proving 
the so-called uncertainty relation for percolation on finite lattices. This is the 
main probabilistic result of the present paper. An important distinction of our 
uncertainty result is that it can be formulated as an explicit condition on the 
noise distribution and the lattice size. Results of this type are very rare both in 
statistical literature and in image analysis research. To the best of or knowledge, 
explicit uncertainty bounds were proved only for Gaussian errors (for example, 
in research on wavelets by Donoho and coauthors). Our uncertainty relation is 
much stronger, because it holds uniformly over a wide class of nonparametric 
error distributions. 

Since the problem of detection of greyscale objects cannot be solved in com- 
plete generality, we might also provide a set of necessary conditions on the image 
that makes the object detection possible. We plan to give a possible set of those 
conditions, as well as the full proof of the consistency theorem for our multiple 
testing method, in our forthcoming paper on the subject. 

The paper is organized as follows. Section 2 gives a necessary minimal in- 
troduction into the mathematical percolation theory. In Section 3, we review 
our previous results on detection of black-and-white objects in noisy images. In 
Section 4, we develop an appropriate model for detection of greyscale objects 
of unknown varying intensity in greyscale pictures corrupted by nonparametric 
noise. We prove consistency of the basic building blocks of our multiple testing 
procedure in Theorem 3. The new uncertainty relation for percolation on finite 
lattices is proved in Section 5. Theorem @] of this section is the main mathemat- 
ical result of the present paper. A new multiple testing procedure for statistical 
image analysis is proposed in Section 6. Some important results from percola- 
tion theory are reviewed in Section 7 of Appendix; this section also contains the 
proof of the uncertainty relation. Section 8 in Appendix contains the discussion 
of bounded detector devices. 

2. Percolation theory 

We start with some basic notions of percolation theory. Let Q be an infinite 
graph consisting of sites s € Q and bonds between sites. The bonds determine 
the topology of the graph in the following sense: We say that two sites s,s' e Q 
are neighbors if there is a bond connecting them. We say that a subset C C Q 
of sites is connected if for any two sites s, s' € C there are sites s±, s n such 
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that s and si, s n and s', and Sk and Sk+i are neighbors for all k = 1, ...,n — 1. 
Considering srfe percolation on the graph means that we consider random 
configurations w <E {0, 1}^ where the probabilitcs arc Bernoulli 

P(w(s) = 1) = p, P(u(s) = 0) = l-p 

independently for each s £ Q where < p < 1 is a fixed probability. If to(s) = 1, 
we say that the site s is occupied. 

Then, under mild assumptions on the graph, there is a phase transition in the 
qualitative behaviour of cluster sizes. To be precise, there is a critical percolation 
probability p c such that for p < p c there is no infinite connected cluster and for 
p > p c there is one. 

This statement and the very definition p c being the location of this phase tran- 
sition are only valid for infinite graphs. We can not even speak of an infinite 
connected cluster for finite graphs. However, a qualitative difference of sizes of 
connected clusters of occupied sites can already be seen for finite graphs, say 
with \Q\ = N sites. In a sense that will be made precise below, the sizes of 
connected clusters are typically of order log N for small p and of order N for 
values of p close to one. This will yield a criterion to infer whether p is close 
to zero or close to one from observed site configurations. Intuitively, for large 
enough values of N the distinction between the two regimes is quite sharp and 
located very near to the critical percolation probability of an associated infinite 
lattice. 



3. Maximum cluster test, consistency and rates of convergence 



The s ignal in our previous papers Langovov and Wittichl ( 2009a ). Davies et al . 
( 2009h and Langovov and Wittich ( 2010al) was assumed to be zero-one which 



corresponds to images with only black and white pixels. In this paper, we will 
show that the consistency result can be modified to cover also the detection of 
grayscale objects of unknown intensity. However, first we have to describe our 
constructions for the basic case. 

Let Q denote a planar graph. We think of the sites s £ Q as the pixels of a 
discretized image and of the graph topology as indicating neighboring pixels. In 
our aforementioned papers, we considered noisy signals of the form 



Y(s) = lg (s)+ae(s) 



(1) 



where lg denotes the indicator function of a subset Go C Q 1 the noise is given 
by independent, identically distributed random variables {e(s),s £ G} with 
Ee = and Ve — 1, and a > is the nois e variance. Thus, a -1 was a m ea- 
sure for the signal to noise rat io. We refer to lLangovov and Wittich (l2009ah or 
Langovov and Wittichl ( 2010a ) for a more detailed introduction. 



Definition 1. (The detection problem) For signals of the form (QP, we con- 
sider the detection problem meaning that we construct a test for the following 
hypothesis and alternative: 

Hq- Go =0, i.e. there is no signal. 
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Hi.' Gq 7^ 0, i.e. there is a signal. 

In our previous work, we constructed tests for the detection problem given in 
Definition[I]above and computed explicit upper bounds for the type I and type II 
errors unde r some mild condition on the sh ape of Qq, called the bulk condition . 
We refer to lLangovov and Wittichl (|2009al ) and Langovoy and Wittich ( 2010a ) 
for proofs. 

The setup is as follows: T^ N > c T denotes the finite triangular lattice consisting 
of the N 2 sites s G T and bonds which are contained in the subset 

1 F\ 
{z G C : < N + t;^( z ) ^ -Y N t- 

By consistency we mean that the test will deliver the correct decision, if the 
signal can be detected with an arbitrarily high resolution. To be precise, we 
think of the signal as a subset Go C [0, l] 2 and write 

: = {(N + l/2)x + iNV3y/2 : {x, y) G G } C C. 
The model from equation ((T|) is now depending on N, and given by 

Y^(s) = l am (s) + ae(s) (2) 



where the sites of the subgraph are given by Qq ={seT : s G Gq N ^} and 
the bonds of the subgraph are all bonds in T that connect two points in Qq N ^ . 

We apply now the threshold in the following way. First, we let r = 1/2, and 
then define 

* T [ j t o ,yW( s )< 1/2 ■ 

We consider the following collection of black pixels 

gf ) :={seTW : F T W( S ) = 1}. (3) 

As a side remark, note that one can view Gq N ^ as an (inconsistent) pre-estimator 
of Gq N ^ . Now recall that we want to construct a test on the basis of , for 
the hypotheses VLq^ ■ Gq N ^ — against the alternative H^' : Qg N ^ ^ 0. 

Definition 2. (The Maximum- Cluster Test) Let 4>{N) be a suitably chosen 
threshold depending on N. Let the test statistic T be the size of the largest 
connected black cluster C C Qq ■ We reject Hq if and only if T > 4>(N). 

For this test, we have the following consistency result under the assumption 
that the support of the indicator function satisfies the following very weak type 
of a shape constraint. 

Definition 3. (The Bulk Condition) We say that the support ^0^' °f the 
signal contains a square of side length p(N) < N if there is a site s G Qq such 

that 3 + rw N » cg ( N) . 



The following consistency result was proved in lLangovov and Wittichl (|2010al ) 



Theorem 1. For the maximum cluster test, we have 
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1. There is some constant Kq > such that for (f>(N) = KglogN, we have 
for the type I error 

lim a(N) = 0. 

JV-s-oo 

2. Let <f>{N) be as above. Let the support Gq N ^ of the signal contain squares 
of side length p(N). If p(N) > KologN, we have for the type II error 

lim P(N) = 0. 

N— >oc 

In particular, in the limit of arbitrary large precision of sampling, the test will 
always produce the right detection result. 

The next Theorem strengthens Theorem [1] and delivers the actual rates of 
convergence for both types of testing errors. It is a remarkable fact that both 
types of errors in our method t end to zero exponen tial ly fast in terms of the 
size of the object of interest. See lDavies et al.l (|2009T) or lLangovov and Wittich 
(|2010ah for the proof. 

Theorem 2. Suppose assumptions of Theorem Q] are satisfied. Then there are 
constants C\ > 0, Ci > such that 

1. The type I error of the maximum cluster test does not exceed 

a{N) < exp(-C 2 0(iV)) 
for all N > cj)(N). 

2. The type II error of the maximum cluster test does not exceed 

(3(N) < eM-Cip(N))) . 
for all N > p{N). 



4. Realistic pictures 

Instead of the above idealized model, in the present paper we consider the non- 
distorted signal of interest to be a bounded function / e L°°(Q), i.e. /(s), s € Q 
is a collection of pixel intensities and there exists a c > such that |/(s)| < c 
for all s € Q . In the sequel, we will call these functions realistic pictures. 

The underlying model for the noisy signal is now as in the indicator signal case 
given by 

Y(s) = f(s) + ae(s) (4) 



and we assume the same properties of the noise as before in Langovov and Wittich! 
(l2010ah . More precisely, we assume the following 



Noise Properties. For a given graph Q, the noise is given by random variables 
{e(s) : s € G} such that 

1. the variables e(s) arc independent, identically distributed with Ee = 
and Ve — 1, 
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2. the noise distribution is symmetric, 

3. the distribution of the noise is non- degenerate with respect to a critical 
probability p c meaning that if F denotes the cumulative distribution func- 
tion of the noise and we define 

m+ = inf{a; £ M : F(x) > 1 - p c }, m~ = sup{x € R : F(x) < 1 - p c } 

then we have m+ = m~ where we denote the common value by m, and 
either 

F(m) > lim F(m - h), (5) 

h— ¥0,h>0 

or 

F'{m) > 0. (6) 

Furthermore, we assume a bounded detector device meaning that only signal 
intensities |V| < r can be properly displayed, and we assume that this is actually 
sufficient, i.e. that \Y\ < r for the incoming signal. This is explained more closely 
in the appendix. 

The test that has to be performed now reads as Ho : / = versus the alternative 
Hi : / 7^ where we assume in an analogous way as before, that / : [0, l] 2 — > 
R is a bounded continuous function. Thus, in a a similar fashion as before, 
we construct tests for different resolutions, namely for the hypotheses Hq ' : 
_ q a g ams t tfi e alternatives : / W ^ where the discretized function 

f(N) . T {N) _^ R jg given by 

fW(s) = f(x,y), s = (N + 1) x + i&Ny. (7) 

and the corresponding signal is given by 

rW( S )=/W(s) + cre(s). 

We now have to slightly modify the test, in particular since we do not have any 
information about the signal strength. This is the main difference to the situa- 
tion with the indicator function and also the main reason to introduce a bounded 
detector device. By that property (and assuming as explained before that the 
intensity scale provided by the detector is actually sufficient to properly display 
the signal, or - likewise - if we condition on that event) we have a compact scale 
of thresholds that has to be explored. 

Let now r > and C?£+ C T (N) denote the super level set 

:={ S gTW : yW( S )>r} 

and G^ N - C denote the sub level set 

g^ N } :={seT {N) : Y^(s) < -r}. 

Assume furthermore, that the bounded detector device under consideration has 
range r > 0. As a threshold, we use the same <fi(N) — Kq log A as in Theorem[T] 
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We attempt to do signal detection using the following test statistics 
Ti N \a) := max{|C*| : Cc ^ black cluster}, 

(8) 

T w (a) := max {\C\ : CcgfJ black cluster} 

where a € [0, r/2]. It is immediate that we have the following properties as in 
the case of indicator functions. 

Lemma 1. Under the null hypothesis, the probabilities that a pixel is erroneously 
marked black are 

1. p E = P(s S Sffi) - P(e > a I a) < 1/2 = Pc , 

2. p E = P{s e G { a N 2) = P{e < -a I a) < 1/2 = Pc 

and hence subcritical. 

Lemma 2. (i) Let C {/^ > a} be a square. Then we have for all 

s € that 

Pb = P{s e = P(e > -a/2a) > 1/2 = p c . 

(ii) Let Q_ C {/^ < —a} be a square. Then we have for all s G Q_ that 
PB = P(s S = P(e < a/2a) > 1/2 = Pc . 

By these two lemmas, we see that for the test statisti cs considered above , 
we are essentially in the same si tuation as we were in Davies et al. ( 20091 ) 



and iLangovov and Wittier] ( 2010a ). Both previous lemmas were valid without 



change if we would consider the respective models 

y+ - i {/ («)> a} + -e, 

Y_ - 1{/W<_a} + -S 



for suitably chosen indicator functions. That implies, we may draw the following 
conclusion by applying exactly the same proof as in Theorem [T] 

Theorem 3. For the test statistics considered above, we have: 

1. There is some constant Kq > such that for 4>(N) — KologN, we have 
under the null hypothesis 

lim P(Tl N) (a) > (f>(N)) = lim P(T^ N \a) > 4>(N)) = 0. 

For Kq, we may use the same choice as in Theorem^ 

2. Let 4>{N) be as above. Let Cf+ C {/^ > a} contain squares of side 
length p(N). If If p(N) > K \ogN, we have 



lim P(Tf ) (a/2)<0(iV))=O. 

N— >oo 
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3. Let 4>(N), p(N) be as above. Let Q_ C {f^ < —a} contain squares of 
side length p(N). Then we also have 

lim P(T {N \a/2) < (f>(N)) = 0. 

N— ¥00 

In particular, the test statistic associated to the correct scale parameter a/2 will 
asymptotically always produce the right detection result. 

At first sight, the situation seems rather similar as for indicator functions in 
Theorem [TJ However, it is completely different in the sense that the consistency 
result only holds if we pick the right signal strength in advance. We might be 
able to overcome this problem by considering a scale of tests for some positive 
a > 0. 



5. Uncertainty 

It is intuitively clear that, for principal reasons, it is not possible to detect 
a signal with arbitrarily small signal to noise ratio on a lattice of finite size, 
no matter which method is used for detection. However, for every particular 
method, it is very difficult to provide a "horizon of consistency" in explicit form. 
Results of this type a very rare in hypothesis testing, image analysis or machine 
learning. Typically, one proves those results in special cases like Gaussian errors. 

In this section, we provide an explicit, closed-form, non-asymptotic and non- 
parametric consistency bound for our method. This bound is valid for a wide 
class of nonparametric noise distributions and is given in Theorem 2] 

Recall from the proof of Theorem Q] that the constant Kq in the threshold 
was given by the inequality 

K a = 2C> Mpe)- 1 

where pe is the (subcritical) probability under the null hypothesis that a pixel 
is erroneously marked black and \(pe) is th e const ant from the Aizenman - 
Newman theorem, see lLangovov and Wittichl ( 2010a ). Thus, we have to begin 



by finding a proper estimate of X(p) for a subcritical p. 
The classical Aizenman-Newman theorem reads as follows. 



Proposition 1. (Aizenman-Newman Theorem) Consider percolation with 
subcritical probability p < p c = 1/2 on the infinite triangular lattice T . Then 
there is a constant X(p) > depending on p such that 

P(\C\>n)<e- nX( - p) (9) 

for all n > 1 where C denotes the black cluster containing the origin. 

Remark. Please note that we use asymptotics-oriented estimates to prove state- 
ments about finite lattices. For instance in the case of (|TU)) below, these estimates 
are not the best possible. So we can by no means expect that the bound in The- 
orem U is sharp. But it is good enough to serve as an illustration of the basic 
principle. 
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In the sequel, x(p) denotes the expected size of the cluster containing G T in 
the infinite lattice depending on the subcritical occupation probability p < p c — 
1/2. 

Lemma 3. For the infinite triangular lattice, we have 

x{p)<^\p~Pc\- 1 . (10) 

Proof: See Appendix 17.31 ■ 

By the Aizenman - Newman Theorem (Proposition [1]), we obtain an upper 
bound for this expectation value by 



xb) = ]Tp(|q>n)<$>-" A w 



1 

n>l n>l 

Thus, we have 



-Hp) 



1 - e -Hp) ' 



A(p) < -log (t^-t) =log(l + -^). ( LI) 



,i + x(p)/ V x(p) 

Combining these two results yields 
Lemma 4. We have 

X{P) log (1 + 18 |p-p c |)- 
Proof: (ITU)) together with (TTT|) implies 

A(p) < log f = log ( 1 + -L ) < log (1 + 18 b- Pel) • 



x(p) J V xip) 



This implies that an intrinsic feature of the procedure is the following form of 
uncertainty: By our procedure, we cannot - even in principle - detect signals 
with arbitrary low signal to noise ratio on a finite lattice of fixed size. 



Of course, it is clear that something like the above statement is valid for any 
statistical testing procedure. Therefore, something similar is also valid for signal 
detection. An important distinction of our uncertainty result is that we can 
give an explicit condition on the noise level and the lattice size, such that this 
condition implies that our test does not work. Results of this type are very rare 
both in statistical literature and in image analysis research. To the best of or 
knowledge, explicit uncertainty bounds were proved only for Gaussian errors (for 
example, in research on wavelets by Donoho and coauthors). Our uncertainty 
relation is much stronger, because it holds irrespectively of the actual noise 
distribution, uniformly over a wide class of nonparametric error distributions. 

To be precise, we consider again the threshold (f)(N) = K logN and the fact 
that in the proof of Theorem[TJ we had to choose Kq = 2C > 2\(pe)~ 1 - That 
implies together with Lemma HI that 

los: N 2 

m = Ko io g( Ao > x iP r iog^ 2 > log(1+1 s 8|P£ „ Pcl) - (12) 
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But that means, that for values of p which are very close to the critical probabil- 
ity, the threshold <fi(N) may exceed the lattice site N 2 and our method breaks 
down. To be precise, we have the following statement. 

Proposition 2. If the lattice size N 2 is fixed, the threshold <fi(N) is larger than 
the lattice size, and therefore, the null hypothesis will never be rejected, if we 
have 

\p E - Pc \<±{(N 2 )^-l}. 
Proof: By (TTJ]), we have <f>(N) > N 2 if 

> N 

log (1 + 18 |p s -p c |) 



Finally, we want to relate this statement directly to the signal strength. Thus, 
if < a we say that the signal to noise ratio is given by p = a/a. Let us 

now assume that the distribution function of the noise F is continuous at zero. 
Then 

\PE - Pc\ = \ - PE = P(0 < e < a/a) = F(p) - F(0) 
and we finally obtain 

Theorem 4. (Uncertainty) Assume that the distribution function of the noise 
is continuous at zero. A signal f( N ' with |/^'| < a and signal to noise ratio 
p = a/a can only be detected by our method, if 

P(0 < £ < p) l_ 
f/V 2 )^ - 1 18 

that means if either the lattice size is sufficiently large or the signal to noise 
ratio is sufficiently small. 

Remark, (i) Note that this statement only means that - as a matter of prin- 
ciple - we can not detect signals of arbitrarily small strength on a finite lattice 
of a given size. That does not at all mean that detection of signals that respect 
the bound above is automatically possible in an effective way. Topics like type 
I and type II error are not at all touched by the uncertainty bound. In other 
words, from the uncertainty relation we can derive only a necessary condition 
for the signal to be detectable via our method. Usually this condition will not 
be sufficient. 

(ii) From studying the behavior of the function 

s(a; ) = -L( e -i--i) 

on the unit interval, we see that the bound is always fulfilled if P(0 < e < p) > 
0.25 w max xe[0)1] s(x). 

The proof of Theorem 0] consists of a simple reformulation of the preceding 
proposition and is therefore omitted. However, we still have to justify why we 
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use the word uncertainty in connection with this statement. This discussion can 
only be purely informal. The analogy simply is that a function of the signal to 
noise ratio times another function of the lattice size have to exceed a certain 
value for a signal to be detectable. Otherwise, the signal is virtually not existing. 
A weaker version of the statement might provide another argument: There is 
some number M > such that 

s(x) < My/x 

for all x € [0,1]. If we assume now that F has a continuous and sufficiently 
smooth density / with f"(0) < 0, we have the weaker statement that the signal 
can be detected only if 



f(0)p > F(p) - F(0) > My/l/N* 

or, if 

N P >jL. (13, 

Thus, for a signal to be detectable, the product of two conjugate parameters 
may not exceed a bound given by the circumstances. Otherwise, the signal is 
not detectable, even in principle. 



6. Multiple testing for realistic pictures 

By the uncertainty principle, we obtain a minimal threshhold value below which 
it does not make any sense to try to detect a signal. So there is a natural lower 
bound t for a threshold. The upper bound is provided by the size r of the 
bounded detector device. That means, the range of intensities of detectable sig- 
nals is [— r, —To] U [to, r]. Thus, if / is the signal, and we assume bulk conditions 
for the super-level sets as in Corollary 1, taking into account the the simple 
monotonicity property that a > a' implies 1//(jv)> \ < l{/w> /\, we will cer- 
tainly be able to consistently detect an object (if the object can be potentially 
detected on the basis of percolation clusters), via the following scheme: 

1. Consider the threshold scheme 

a k =2- k r,k = l,...,N. 

2. Beginning with a — a±, calculate the test statistics T^ N \a), T^ N \a). 
Terminate, if cither the null hypothesis is rejected (at a properly adjusted 
level, if necessary) or if you reach au with k > log(r/ro). 

It can be shown that, under certain conditions on / and a, we would have to 
repeat the maximum cluster test at most O(logiV) times. Since each repetition 
of the maximum cluster test takes 0(N 2 ) operations, the new multiple testing 
procedure is going to take at most 0(N 2 log N) operations overall. Since the 
input size is iV 2 , this implies that under some conditions our initial procedure 
(of linear complexity) slows down by a logarithmic factor. Asymptotically, this 
is only slightly slower than the original test, but the new test is adaptive with 
respect to the unknown image color intensity. 
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A point that needs to be addressed carefully here is the probability of false 
rejection of the null hypothesis. Indeed, we perform here not a single test, but a 
collection of up to 0(log N) tests, and results of those tests are not independent. 
This is a basic question that always occurs in the field of multiple testing. Luckily 
for us, for each of the thresholds ak the direct analog of Theorem [5] holds: the 
type I error of any single test tends to zero exponentially fast, while the power 
tends exponentially fast to one. Moreover, our tests are "monotonous" with 
respect to the threshold value (since the maximum cluster size is an increasing 
event) . This also implies that we have to pay attention only to those thresholds 
ak where at least one of the level sets Q^} and 0^2 doesn't contain black 
clusters crossing the whole screen. Using those properties, we will be able to 
combine the results of not more than O(logA) tests T^ N \a k ) and T^Vfc) 
and get a unique decision out of them, while keeping the type I error of the 
multiple test controlled. We plan to present those results in succeeding papers. 

Acknowledgments. The authors would like to thank the EURANDOM Re- 
port Series reviewers for carefully reading this manuscript. 
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Appendix. 



7. Some facts from percolation theory 

In this section, we collect some basic statements and techniques from the theory 
of percolation. In particular, we are going to prove the inequality (I10[) which is 
basic for the introduction of uncertainty principle. 



7.1. FKG and BK inequality 

Recall the partial ordering 

lji -< ui2 < 0^2(3) for all sgT. 

on the set f2 = {0, 1} of all percolation configurations from Definition 11 and 
that an event A C is increasing if we have 

< 1.4(^2) 

for the corresponding indicator variable whenever uji ^ ll>2- 

The FKG inequality was already stated before and is just added here another 
time for completeness. 

Proposition 3. (FKG inequality) If A and B are both increasing (or both 
decreasing) events, then we have 

P(A n B) > P(A)P{B). 

Proof jFortuin et all (|l97lh ■ 

Let Q C T be a finite subgraph and 

^cr({0,l} s )C(T({0,l} r ) =: T r 

the sub sigma - algebra associated to the percolation configurations on Q (in 
the canonical version). Let now A,B G Tg be two increasing events. We define 
the support ofw e {0, l} 6 to be 

suppw := {s £ 5 : lu(s) = 1}. 
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and for a subset H C suppw, we write 



1 se H 
else 



Definition 4. Let A, B be as above. The event A o B that A and B occur 
disjointly is given by 

A o B := {u G {0, 1} T : 3 H ( w)esuppw u\ H (u) G A, w| sup p W _#( w ) G £}. 

The BK inequality now reads as follows. 

Proposition 4. (BK inequality) Let A,BG Tg be increasing events. Then 

P(AoB) < P(A)P(B). 
Proof lGrimmettJ (| 19991) . p. 38 ff. ■ 



7.2. Russo's formula 

Let s G T be a site. We consider the involution j s : {0, l} 7 " — > {0, l} 7 " given by 
. / w m / w(s') s' 7^ s 

Js(w)(s):= ( 1-C( S ') S ' = s • 

From this definition, we see that the configuration space is a disjoint union 
{0, 1} T = n(s)+ U j s r2(s)+, where 

fi(s)+ := {w G {0,1} T : ui{s) = 1}. 

Definition 5. (Pivotal sites) Let Q C T be a finite subgraph and A C J-g &e 
an increasing event. The event the site s is pivotal for A is given by 

Piv(A, s) := {to G {0, f } r : ^ 1a o j.(u)}. 

Russo's formula is a statement about how the probability of a certain event 
changes if the individual site occupation probability p is changed. We denote by 
P p (A) the probability of the event A if this probability is p and by 

Na ■= ^ lpiv(A,s) 
see 

the number of pivotal elements for A. 

Proposition 5. (Russo's formula) Let Q c T be a finite subgraph and A C 
Tg be an increasing event. Then 

^Pp(A) = E p N A . (14) 

Proof: (i) First of all, since A is increasing and lu ^ J«(w) for all u G 0(s) + , 
we have on the set f2(s) + 

m n ■ w n / 1 wePiv( J 4,s)nfi(s)+ 

(U-lAO = < Q elge = lpiv(A,»)nn(a)+- 
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(ii) In the sequel, we write f2(s) _ = j s fl(s) + . Let P|n( s )- denote the restriction 
of the measure to Q(s)~ . Then, the image measure under j s is a measure on 
fi(s) + with density 

dP\n( s )- °3s = F(n(s)+) 
d¥ V(fl(s)-)' 

/n W - lA(w)P(dw) _ J n(s)+ l A oj s (w')Foj s (du') 



That implies 

E p [l A \n(a)-] 



= E p [i A oj s \n( s )+]. 

(iii) Now let p' > p and denote by -Ep', s the expectation with respect to the 
product measure P s with marginals 

p ^ =v={p i: 3 ■ 

Thus 

P p ', s ( A ) ~ p p( A ) = E P ', s l a - E P 1 A 

= p p> A^(s) + )e P ',s [uMs)+] + p p ,, s (ri(s)-)E p , !a [i A p{s)-] 
-p p (n(s)+)E p [i A \n( s )+] -p p (n( s )-)E p [i A \n( s )-] 

= (p 1 -p)E p [l A -l A oj s \Luen( s )+] +E p ,, s [l A \Q(s)-] - E p [l A \Sl(s)-] 

= (p' - P)E P [lpiv(A, s )nn( s )+ \ u e ^( s ) + ] 
= (p' - p)E p [l Piv (A, s )] = ip' - p)P p {V\v{A, a)). 
That implies finally 

;>P t ! i 

P p (Piv(A,s)). 



dP P (A) 



dp(s) 

(iv) By A € Tq , we have 

e p \ a = e p \\ a \Tq\ = J2 n seg i A ( w )n seg p p (tj( s )) 

ue{o,i) s 

that means, we may think of the distribution P p as a distribution depending on 
finitely many real parameters {p(s) : s € G}. That implies together with (iii) 

i p M) = E = £ *» - E ^piv(^) = ^ 

^ sea v ; seS sea 



7.3. IVie proo/ o/ (f70)j 

We follow closely the proof in Grimmett ( 19991 ). p. 263 ff. Let P p {x,y) denote 
the probability that there is a path connecting the sites x and y and P p N ^ (x, y) 
the probability that there is a path connecting x and y which lies entirely in 

r (JV) .Now 

XN(p,y):= E p p W (^y) 
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is the expected size of the connected cluster around y in and 

x(p,y) ■= p p( x >y) 

the expected cluster size in T. Note that x(p) = x(PjO)- Furthermore, we write 
Xn(p) ■= max{xw(p,y) : y € T (A °}. 

(i) First of all, 

x(p) > xiv(p) > xiv(p, o) - X] °) ^ E p p(*' y) = x(p) 

xeT'- N > ieT 
implies that we have by bounded convergence 

Jim Xat(p) = x(p)- 

iv— »oo 

(ii) Denote by A^(x, y) the event that there is a path connecting x and y in 
T'W. Then, by Russo's formula, 

j-XJv(p,y)= H P p (Piv(Ajv(x,y),s)). 

P ier(") serf") 

A site s G "yW is now pivotal for Aat(x, y), if and only if 

1. s is adjacent to two different and non - adjacent sites x' and y'. 

2. There is a path connecting x and a/. 

3. There is a disjoint path connecting y and y', meaning that no site in this 
path is adjacent to any site in the path connecting x ans x' . 

This means that switching s on or off will switch a connection between x and 
y on or off (which changes the value of the corresponding indicator function). 
Having disjoint paths between different pairs of sites is a typical example of a 
disjointly occuring event and therefore we can write the three conditions above 
shortly by saying that for all x, y ^ z e yW and all x' ^ y' adjacent to and 
different from s, we have 

A N (x,x') o A N (y,y') C Piv(A N (x,y),s) 

and that on the other hand 

Piv(A N (x,y),s) = (J A N (x,x')oA N (y,y'). 

x'^y' adjacent to s 

That implies by BK inequality 

P p (PW(A N (x,y),s))< J2 P<; N Hx,x')pW(y,y'). 

x'^y' adjacent to s 
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Finally inserting this into Russo's formula yields 

-^-xn{p,v) = E E P p( Fiv ( A N(x,y),s)) 

P lerc) serf™' 

< E E E PW(x, X >)pW(y,y>) 

i£7"(' v ) s ^7"(iV) x'^y 1 adjacent to s 

= E E XNip,x')P^(y,y') 

s£T( N ) x'^y' adjacent to s 

< xn(p) E E p { p n) M) 

s ^7~(iv) x'^y' adjacent to 5 

= 3XN(p) E E P^M) 

sfz'-fiN) y' adjacent to s 

= 3x6 X n(p) E p^Hv,*) 

sere) 

= i&XN(p)xN(p,y) < 18xat(p) 2 . 

(iii) Integrating this differential inequality over the interval [p,j? c ] yields 

' ' < 18 (p - Pc) 



Xn(p) Xn(Pc) 



(for details, see the above mentioned proof in iGrimmetti (|1999I )) and by (i) 

x(p) > 



Xn — ► X an d the fact that x(Pc) — oo we finally obtain 

1 



18 (p - Pc) ' 
7-4- Matching graphs and p c — 1/2 

In thi s subsection, we will shortly review the material from Svkes and Essaml 



(|1964[ ) about site percolation and matching graphs. We start with a finite graph 
Q with N sites. The probability that a site is marked active (or black) is given 
by p, the probability that it is marked inactive (or white) is q = 1 — p. Denote 
a connected cluster of black points by C and its boundary by 

dC := {s € Q — C : s is adjacent to some site s' € C}. 

That means, the expected cluster size is a polynomial in p and q given by 

K(p,q,Q)=E\C\=Y J |C|p |C| 9 |aC| . 
ccs 

By reversing the roles of p and q, we obtain the expected numbers of white 
clusters. To extend this concept to infinite graphs, we consider the mean cluster 
size per site 

k( P ,q,g) = E(\c\/\g\) = i E \c\p lc W dCl , (is) 

11 CCS 
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use a proper exhaustion Gi C G2 C ... C 5 of an infinite graph (J and consider 
the formal power series 

k(p,q,Q)= lim V |qpl c lgl ac l 

l^-l ccgfc 

which shows that we obtain in this case the expected finite cluster size per size, 
taking into account only finite subclusters from Q . By 

k L (p,g) = k(p,i- P ,g), k H (q,g) = k(i- q ,q,g), (ie) 

we clearly have fci(p, G) — k H {l - p, G) and k H (q, G) — k L (l — q, G)- 

Definition 6. We call a (possibly infinite) graph G self - matching, if there is 
a polynomial ifg (p) such that 

k L (p,G) = ipg(p) + k H ( Pl G). (17) 
ipg is called the matching polynomial. 
Theorem 5. The triangular lattice is self-matching with 

VT{P)=P-'ZP 2 + 2p 3 . 
Proof: Sec Svkcs and Essaml (|l964h . ■ 



When we assume that kL,(p,T) has precisely one pole a t the critical p ercolation 
probability p c for the triangular lattice (see for instance Kestenl (|l982h ). we can 



actually use the preceding theorem to determine p c . Here, the special form of 
the matching polynomial does not play any role, only the fact that it is a poly- 
nomial and hence bounded on p g [0, 1] is important. Therefore fc/,(p c ,T) = 00 
implies fcjr(p c , T) = Ul (1 — Pci T) =00. The assumption that there is only one 
pole immediately implies p c = 1 — p c and thus p c = 1/2. 

Remark. If the graph G is not self - matching, we can construct a so ca lled 
matching grap h G* (for the construction, see again ISvkes and Essaml ( 1964f) . or 
Kestenl (jl982l) ) with the same number of vertices such that instead of (|T7|. we 



have 

kb{p,G) = <p(p) + k H (p,G*), k L (p,G*) = (fi*{p) + k H (p,G), 

together with some relations between (p and ip* and these equations can be 
used in a similar way as above to obtain some information about the critical 
probability. (G, G*) is called a matching pair. For self - matching graphs, we have 
G* =G- 



8. Bounded detector devices 

In the discussion of realistic signals, we introduced the notion of a bounded 
detector device. In statistical terminology, those devices from an instance of the 
method of truncation. A bounded detector device of range r > is only able 
to display signal strengths Y(s) with intensities between — r and r. Thus, the 
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effect of the detector device on a signal Y is that instead of the full information 
about Y(s), s e S, only the information contained in the cutoff signal 

D(Y) = max{min{r, r}, -r} (18) 

is used for further analysis. Intensities of absolute value larger than r can simply 
not be registered and all information about the behavior of the signal above and 
below the cutoff is lost before the signal processing even starts. 



The detection results in the present paper were proved for bounded signals. 
What happens if this assumptions doesn't hold? First of all, from a purely 
mathematical point of view, the notion of bounded detector devices can be 
equivalently reformulated by saying that all considerations are only valid as 
statements that are obtained while conditioning on the event {\Y\ < r}. In 
other words, all results are still valid without any change if we understand them 
as being obtained by conditioning on the event 

D a :={D(Y)=Y}. (19) 

Of course, the probability ttd := P(Dq) then yields an important characteristic 
of the detector device, and it could be often desirable to have ttd close to one. 
However, a deeper analysis of biological, engineering and cybernetical aspects 
of the problem leads us to the following extremely useful observation. 

We think of signal processing as consisting of at least three different parts, 

1. a filter which has the purpose to transform the incoming signal to fit in 
an optimal way into the bounds of the detector device, 

2. the bounded detector device as described above, and 

3. the processor, which analyses the detected signal D{Y) and determines 
what is finally perceived. 

We thus arrive at the following scheme 



Signal — > Filter — > Detector — >• Processor — > Perception 



where the detector is the fixed component, the filter is chosen on the basis of 
the incoming signal and the bounds of the detector and the processor algorithm 
is chosen on the basis of knowledge about the detector and the chosen filter. 
Choosing an appropriate filter for a given environment is thus another problem 
of perception, a problem that we will not address in these notes. 



Example. As an example, as the detector device of the human eye, we only 
consider the photo receptors situated at the retina, the processor obviously is 
the brain, and the filter is given by lens and iris which adapt to different light 
intensities for instance in night vision, but can also be those parts together with 
another device like, for instance, sun glasses. 

For a visual perception of any system in biology or cybernetics, the meaning 
of a good Filter is exactly to filter out (or to transform) the incoming information 
in such a way that the Detector might still perceive a reasonable part of reality, 
despite the fact that the Detector works with signals in the diapason [— r, r] 
only. Say, in the above Example, a human eye doesn't have to properly perceive 
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ultraviolet and infrared light frequencies in order to be able to see trees. A 
human brain doesn't need to process any information that could come with 
ultraviolet and infrared lights either. 

This implies that our consideration of bounded detector devices fits many 
important biological situations. Moreover, working with bounded detector de- 
vices can be profitable for construction of artificial vision systems in robotics. 
A robot needs to perceive and to process only signals and information within 
the diapason that fits his tasks. 
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